Loading the data
So, lets load some data. Since it is the topic of this lecture series, why not do a bibliographic mapping of “Innovation system” and “innovation ecosystem”" literature. Here I use the web of science database on scientific literature. I here downloaded the following query.
- Data source: Clarivate Analytics Web of Science (http://apps.webofknowledge.com)
- Data format: bibtex
- Query: TOPIC: (“innovation system” OR “systems of innovation” OR “innovation ecosystem”)
- Timespan: the beginning of time - March 2019
- Document Type: Articles
- Language: English
- Query data: March, 2019
- Selection: 1000 most cited
We now just read the plain data with the inbuild convert2df() function
M <- readFiles("https://www.dropbox.com/s/2jh33ktj3ox7ztu/biblio_nw1.txt?dl=1")
M %<>%
convert2df(dbsource = "isi",
format = "plaintext")
##
## Converting your isi collection into a bibliographic dataframe
##
## Articles extracted 100
## Articles extracted 200
## Articles extracted 300
## Articles extracted 400
## Articles extracted 500
## Done!
##
##
## Generating affiliation field tag AU_UN from C1: Done!
M %>% glimpse()
## Observations: 500
## Variables: 64
## $ PT <chr> "J", "J", "J", "J", "J", "J", "J", "J", "J", "J", "J"...
## $ AU <chr> "RUBINOV M;SPORNS O", "LANGFELDER P;HORVATH S", "SMIT...
## $ AF <chr> "RUBINOV, MIKAIL; SPORNS, OLAF", "LANGFELDER, PETER; ...
## $ TI <chr> "COMPLEX NETWORK MEASURES OF BRAIN CONNECTIVITY: USES...
## $ SO <chr> "NEUROIMAGE", "BMC BIOINFORMATICS", "PROCEEDINGS OF T...
## $ LA <chr> "ENGLISH", "ENGLISH", "ENGLISH", "ENGLISH", "ENGLISH"...
## $ DT <chr> "ARTICLE", "ARTICLE", "ARTICLE", "ARTICLE", "ARTICLE"...
## $ ID <chr> "STATE FUNCTIONAL CONNECTIVITY; GRAPH-THEORETICAL ANA...
## $ AB <chr> "BRAIN CONNECTIVITY DATASETS COMPRISE NETWORKS OF BRA...
## $ C1 <chr> "[SPORNS, OLAF] INDIANA UNIV, DEPT PSYCHOL & BRAIN SC...
## $ RP <chr> "SPORNS, O (REPRINT AUTHOR), INDIANA UNIV, DEPT PSYCH...
## $ EM <chr> "OSPORNS@INDIANA.EDU", "PETER.LANGFELDER@GMAIL.COM; S...
## $ RI <chr> "SPORNS, OLAF/A-1667-2010", "KARAYEL, BORA/E-2173-201...
## $ OI <chr> "SPORNS, OLAF/0000-0001-7265-4036", "KARAYEL, BORA/00...
## $ FU <chr> "J.S. MCDONNELL FOUNDATION [JSMF22002082]; CSIRO ICT ...
## $ FX <chr> "WE THANK ROLF KOTTER, PATRIC HAGMANN, AVIAD RUBINSTE...
## $ CR <chr> "ACHARD S, 2006, J NEUROSCI, V26, P63, DOI 10.1523/JN...
## $ NR <chr> "69", "48", "38", "30", "94", "37", "34", "18", "42",...
## $ TC <dbl> 2848, 2152, 2004, 1790, 1274, 752, 703, 682, 672, 601...
## $ Z9 <chr> "2911", "2175", "2021", "1815", "1304", "763", "716",...
## $ U1 <chr> "35", "38", "12", "42", "11", "9", "11", "11", "4", "...
## $ U2 <chr> "393", "39", "208", "431", "130", "151", "150", "123"...
## $ PU <chr> "ACADEMIC PRESS INC ELSEVIER SCIENCE", "BIOMED CENTRA...
## $ PI <chr> "SAN DIEGO", "LONDON", "WASHINGTON", "LONDON", "WASHI...
## $ PA <chr> "525 B ST, STE 1900, SAN DIEGO, CA 92101-4495 USA", "...
## $ SN <chr> "1053-8119", "1471-2105", "0027-8424", "0028-0836", "...
## $ EI <chr> "1095-9572", NA, NA, "1476-4687", NA, "1476-4687", NA...
## $ J9 <chr> "NEUROIMAGE", "BMC BIOINFORMATICS", "P NATL ACAD SCI ...
## $ JI <chr> "NEUROIMAGE", "BMC BIOINFORMATICS", "PROC. NATL. ACAD...
## $ PD <chr> "SEP", "DEC 29", "AUG 4", "NOV 1", "FEB 11", "JUN 16"...
## $ PY <dbl> 2010, 2008, 2009, 2012, 2009, 2011, 2013, 2009, 2009,...
## $ VL <chr> "52", "9", "106", "491", "29", "474", "45", "106", "3...
## $ IS <chr> "3", NA, "31", "7422", "6", "7351", "1", "36", NA, "1...
## $ BP <chr> "1059", NA, "13040", "119", "1860", "380", "25", "152...
## $ EP <chr> "1069", NA, "13045", "124", "1873", "+", "U52", "1527...
## $ DI <chr> "10.1016/J.NEUROIMAGE.2009.10.003", "10.1186/1471-210...
## $ PG <chr> "11", "13", "6", "6", "14", "2", "11", "5", "7", "29"...
## $ WC <chr> "NEUROSCIENCES; NEUROIMAGING; RADIOLOGY, NUCLEAR MEDI...
## $ SC <chr> "NEUROSCIENCES & NEUROLOGY; RADIOLOGY, NUCLEAR MEDICI...
## $ GA <chr> "629FY", "402FP", "479NT", "028PM", "406NC", "777TD",...
## $ UT <chr> "ISI000280181800027", "ISI000262999900002", "ISI00026...
## $ PM <chr> "19819337", "19114008", "19620724", "23128233", "1921...
## $ DA <chr> "2018-10-04", "2018-10-04", "2018-10-04", "2018-10-04...
## $ ER <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "...
## $ AR <chr> NA, "559", NA, NA, NA, NA, NA, NA, NA, NA, NA, "E1000...
## $ OA <chr> NA, "GOLD", "GOLD_OR_BRONZE", "GREEN_ACCEPTED", "GOLD...
## $ DE <chr> NA, NA, "BRAIN CONNECTIVITY; BRAINMAP; FMRI; FUNCTION...
## $ CA <chr> NA, NA, NA, "INT IBD GENETICS CONSORTIUM IIBDGC", NA,...
## $ SU <chr> NA, NA, NA, NA, NA, NA, NA, NA, "S", NA, NA, NA, NA, ...
## $ BE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ SE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ BN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ PN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ CT <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ CY <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ CL <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ SP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ SI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ DB <chr> "ISI", "ISI", "ISI", "ISI", "ISI", "ISI", "ISI", "ISI...
## $ AU_UN <chr> "INDIANA UNIV;UNIV NEW S WALES;UNIV NEW S WALES;QUEEN...
## $ AU1_UN <chr> "INDIANA UNIV", "UNIV CALIF LOS ANGELES", "UNIV OXFOR...
## $ AU_UN_NR <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ SR_FULL <chr> "RUBINOV M, 2010, NEUROIMAGE", "LANGFELDER P, 2008, B...
## $ SR <chr> "RUBINOV M, 2010, NEUROIMAGE", "LANGFELDER P, 2008, B...
To figure out what the field mean, check the WoS fieldtags.
Descriptive Analysis
Although bibliometrics is mainly known for quantifying the scientific production and measuring its quality and impact, it is also useful for displaying and analysing the intellectual, conceptual and social structures of research as well as their evolution and dynamical aspects.
In this way, bibliometrics aims to describe how specific disciplines, scientific domains, or research fields are structured and how they evolve over time. In other words, bibliometric methods help to map the science (so-called science mapping) and are very useful in the case of research synthesis, especially for the systematic ones.
Bibliometrics is an academic science founded on a set of statistical methods, which can be used to analyze scientific big data quantitatively and their evolution over time and discover information. Network structure is often used to model the interaction among authors, papers/documents/articles, references, keywords, etc.
Bibliometrix is an open-source software for automating the stages of data-analysis and data-visualization. After converting and uploading bibliographic data in R, Bibliometrix performs a descriptive analysis and different research-structure analysis.
Descriptive analysis provides some snapshots about the annual research development, the top “k” productive authors, papers, countries and most relevant keywords.
Main findings about the collection
results <- biblioAnalysis(M)
summary(results,
k = 20,
pause = F)
##
##
## Main Information about data
##
## Documents 500
## Sources (Journals, Books, etc.) 268
## Keywords Plus (ID) 2480
## Author's Keywords (DE) 1200
## Period 2008 - 2016
## Average citations per documents 150.6
##
## Authors 3562
## Author Appearances 3889
## Authors of single-authored documents 27
## Authors of multi-authored documents 3535
## Single-authored documents 28
##
## Documents per Author 0.14
## Authors per Document 7.12
## Co-Authors per Documents 7.78
## Collaboration Index 7.49
##
## Document types
## ARTICLE 478
## ARTICLE; BOOK CHAPTER 4
## ARTICLE; PROCEEDINGS PAPER 17
## ARTICLE; RETRACTED PUBLICATION 1
##
##
## Annual Scientific Production
##
## Year Articles
## 2008 65
## 2009 92
## 2010 83
## 2011 79
## 2012 66
## 2013 38
## 2014 40
## 2015 27
## 2016 10
##
## Annual Percentage Growth Rate -20.86186
##
##
## Most Productive Authors
##
## Authors Articles Authors Articles Fractionalized
## 1 HORVATH S 20 HORVATH S 3.88
## 2 GESCHWIND DH 12 LEYDESDORFF L 2.33
## 3 LANGFELDER P 8 DEARING JW 2.00
## 4 MILLER JA 7 LANGFELDER P 1.92
## 5 HE Y 6 GESCHWIND DH 1.66
## 6 BORSBOOM D 5 BODIN O 1.50
## 7 COPPOLA G 5 BOSCHMA R 1.50
## 8 ZHANG B 5 DAWSON S 1.50
## 9 BASSETT DS 4 DING Y 1.50
## 10 BULLMORE ET 4 ERNSTSON H 1.33
## 11 CHO JH 4 INGOLD K 1.33
## 12 GAO FY 4 JORDAN F 1.25
## 13 KNIGHT R 4 BRANDES U 1.17
## 14 LEYDESDORFF L 4 BLUTHGEN N 1.14
## 15 MENON V 4 BORSBOOM D 1.13
## 16 MILL J 4 MILLER JA 1.13
## 17 OLDHAM MC 4 SCHENSUL JJ 1.09
## 18 OPHOFF RA 4 MENON V 1.07
## 19 SAITO K 4 HE Y 1.06
## 20 SMITH SM 4 ASHTON W 1.00
##
##
## Top manuscripts per citations
##
## Paper TC TCperYear
## 1 RUBINOV M, 2010, NEUROIMAGE 2848 316.4
## 2 LANGFELDER P, 2008, BMC BIOINFORMATICS 2152 195.6
## 3 SMITH SM, 2009, P NATL ACAD SCI USA 2004 200.4
## 4 JOSTINS L, 2012, NATURE 1790 255.7
## 5 BUCKNER RL, 2009, J NEUROSCI 1274 127.4
## 6 VOINEAGU I, 2011, NATURE 752 94.0
## 7 DELOUKAS P, 2013, NAT GENET 703 117.2
## 8 EAGLE N, 2009, P NATL ACAD SCI USA 682 68.2
## 9 CHEN J, 2009, NUCLEIC ACIDS RES 672 67.2
## 10 THIELE I, 2010, NAT PROTOC 601 66.8
## 11 FRANSSON P, 2008, NEUROIMAGE 572 52.0
## 12 SUPEKAR K, 2008, PLOS COMPUT BIOL 539 49.0
## 13 XUE J, 2014, IMMUNITY 531 106.2
## 14 FOWLER JH, 2008, BRIT MED J 503 45.7
## 15 MILL J, 2008, AM J HUM GENET 480 43.6
## 16 BAILEY P, 2016, NATURE 452 150.7
## 17 AIROLDI EM, 2008, J MACH LEARN RES 443 40.3
## 18 SUPEKAR K, 2009, PLOS BIOL 413 41.3
## 19 BARBERAN A, 2012, ISME J 383 54.7
## 20 GARDY JL, 2011, NEW ENGL J MED 369 46.1
##
##
## Corresponding Author's Countries
##
## Country Articles Freq SCP MCP MCP_Ratio
## 1 USA 231 0.46293 161 70 0.303
## 2 CHINA 35 0.07014 18 17 0.486
## 3 UNITED KINGDOM 34 0.06814 16 18 0.529
## 4 NETHERLANDS 27 0.05411 15 12 0.444
## 5 GERMANY 26 0.05210 14 12 0.462
## 6 CANADA 20 0.04008 9 11 0.550
## 7 ITALY 18 0.03607 7 11 0.611
## 8 AUSTRALIA 16 0.03206 6 10 0.625
## 9 SPAIN 11 0.02204 3 8 0.727
## 10 SWEDEN 11 0.02204 6 5 0.455
## 11 SWITZERLAND 10 0.02004 6 4 0.400
## 12 FRANCE 7 0.01403 4 3 0.429
## 13 KOREA 7 0.01403 4 3 0.429
## 14 JAPAN 6 0.01202 6 0 0.000
## 15 BELGIUM 5 0.01002 1 4 0.800
## 16 AUSTRIA 4 0.00802 2 2 0.500
## 17 IRELAND 4 0.00802 2 2 0.500
## 18 FINLAND 3 0.00601 1 2 0.667
## 19 BRAZIL 2 0.00401 0 2 1.000
## 20 CUBA 2 0.00401 1 1 0.500
##
##
## SCP: Single Country Publications
##
## MCP: Multiple Country Publications
##
##
## Total Citations per Country
##
## Country Total Citations Average Article Citations
## 1 USA 39460 170.8
## 2 UNITED KINGDOM 7023 206.6
## 3 CHINA 3819 109.1
## 4 CANADA 3440 172.0
## 5 GERMANY 3344 128.6
## 6 NETHERLANDS 3132 116.0
## 7 AUSTRALIA 2128 133.0
## 8 ITALY 2046 113.7
## 9 SWEDEN 1502 136.5
## 10 SPAIN 1265 115.0
## 11 SWITZERLAND 1141 114.1
## 12 JAPAN 1002 167.0
## 13 FRANCE 801 114.4
## 14 KOREA 735 105.0
## 15 IRELAND 650 162.5
## 16 AUSTRIA 540 135.0
## 17 BELGIUM 389 77.8
## 18 GREECE 384 192.0
## 19 FINLAND 324 108.0
## 20 INDIA 280 140.0
##
##
## Most Relevant Sources
##
## Sources Articles
## 1 PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA 25
## 2 PLOS ONE 22
## 3 NEUROIMAGE 15
## 4 NATURE 10
## 5 ISME JOURNAL 9
## 6 NUCLEIC ACIDS RESEARCH 9
## 7 CELL 7
## 8 GENOME RESEARCH 7
## 9 BIOINFORMATICS 6
## 10 BMC BIOINFORMATICS 6
## 11 PLOS GENETICS 6
## 12 BRAIN 5
## 13 CANCER RESEARCH 5
## 14 JOURNAL OF INFORMETRICS 5
## 15 MOLECULAR SYSTEMS BIOLOGY 5
## 16 BMC GENOMICS 4
## 17 DECISION SUPPORT SYSTEMS 4
## 18 EXPERT SYSTEMS WITH APPLICATIONS 4
## 19 JOURNAL OF NEUROSCIENCE 4
## 20 LANDSCAPE AND URBAN PLANNING 4
##
##
## Most Relevant Keywords
##
## Author Keywords (DE) Articles Keywords-Plus (ID) Articles
## 1 SOCIAL NETWORK ANALYSIS 43 NETWORK ANALYSIS 41
## 2 NETWORK ANALYSIS 41 EXPRESSION 32
## 3 GRAPH THEORY 14 GENE EXPRESSION 29
## 4 SOCIAL NETWORKS 13 NETWORKS 26
## 5 SYSTEMS BIOLOGY 10 ORGANIZATION 25
## 6 FUNCTIONAL CONNECTIVITY 9 IDENTIFICATION 24
## 7 CONNECTIVITY 7 COMPLEX NETWORKS 22
## 8 FMRI 7 CENTRALITY 21
## 9 NETWORK 7 DISEASE 21
## 10 CENTRALITY 6 DYNAMICS 20
## 11 RESTING STATE 6 PATTERNS 17
## 12 TRACTOGRAPHY 6 ALZHEIMERS DISEASE 16
## 13 CLUSTERING 5 EVOLUTION 16
## 14 MICROARRAY 5 MODEL 16
## 15 NETWORKS 5 COMMUNITY STRUCTURE 15
## 16 COMMUNITY 4 ESCHERICHIA COLI 15
## 17 COMPLEX NETWORKS 4 FUNCTIONAL CONNECTIVITY 15
## 18 DIFFUSION TENSOR IMAGING 4 PERFORMANCE 15
## 19 GENE EXPRESSION 4 BEHAVIOR 14
## 20 METABOLOMICS 4 MASS SPECTROMETRY 14
plot(results)





Most Cited References (internally)
CR <- citations(M,
field = "article",
sep = ";")
cbind(CR$Cited[1:10]) %>% head()
## [,1]
## WASSERMAN S, 1994, SOCIAL NETWORK ANAL 63
## WATTS DJ, 1998, NATURE, V393, P440, DOI 10.1038/30918 49
## ZHANG B, 2005, STAT APPL GENET MO B, V4, DOI 10.2202/1544-6115.1128 47
## FREEMAN LC, 1979, SOC NETWORKS, V1, P215, DOI 10.1016/0378-8733(78)90021-7 42
## LANGFELDER P, 2008, BMC BIOINFORMATICS, V9, DOI 10.1186/1471-2105-9-559 37
## SHANNON P, 2003, GENOME RES, V13, P2498, DOI 10.1101/GR.1239303 29
The conceptual structure and context - Co-Word Analysis
Co-word networks show the conceptual structure, that uncovers links between concepts through term co-occurences.
Conceptual structure is often used to understand the topics covered by scholars (so-called research front) and identify what are the most important and the most recent issues.
Dividing the whole timespan in different timeslices and comparing the conceptual structures is useful to analyze the evolution of topics over time.
Bibliometrix is able to analyze keywords, but also the terms in the articles’ titles and abstracts. It does it using network analysis or correspondance analysis (CA) or multiple correspondance analysis (MCA). CA and MCA visualise the conceptual structure in a two-dimensional plot.
We can even do way more fancy stuff with abstracts or full texts (and do so). However, I dont want to spoiler Romans sessions, so I will hold myself back here
Co-word Analysis through Keyword co-occurrences
Plot options:
- normalize = “association” (the vertex similarities are normalized using association strength)
- n = 50 (the function plots the main 50 cited references)
- type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
- size.cex = TRUE (the size of the vertices is proportional to their degree)
- size = 20 (the max size of the vertices)
- remove.multiple=FALSE (multiple edges are not removed)
- labelsize = 3 (defines the max size of vertex labels)
- label.cex = TRUE (The vertex label sizes are proportional to their degree)
- edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
- label.n = 30 (Labels are plotted only for the main 30 vertices)
- edges.min = 25 (plots only edges with a strength greater than or equal to 2)
- all other arguments assume the default values
NetMatrix <- biblioNetwork(M,
analysis = "co-occurrences",
network = "keywords",
sep = ";")
# net <- networkPlot(NetMatrix,
# normalize = "association",
# n = 50,
# Title = "Keyword Co-occurrences",
# type = "fruchterman",
# size.cex = TRUE, size = 20, remove.multiple = FALSE,
# edgesize = 10,
# labelsize = 3,
# label.cex = TRUE,
# label.n = 50,
# edges.min = 2)
Thematic Map
Co-word analysis draws clusters of keywords. They are considered as themes, whose density and centrality can be used in classifying themes and mapping in a two-dimensional diagram.
Thematic map is a very intuitive plot and we can analyze themes according to the quadrant in which they are placed: (1) upper-right quadrant: motor-themes; (2) lower-right quadrant: basic themes; (3) lower-left quadrant: emerging or disappearing themes; (4) upper-left quadrant: very specialized/niche themes.
Please see Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146-166.
NetMatrix <- biblioNetwork(M,
analysis = "co-occurrences",
network = "keywords",
sep = ";")
S <- normalizeSimilarity(NetMatrix,
type = "association")
Map <- thematicMap(M,
minfreq =5 )
plot(Map$map)

Lets inspect the clusters we found:
clusters <-Map$words %>%
arrange(Cluster, desc(Occurrences))
clusters %>%
select(Cluster, Words, Occurrences) %>%
group_by(Cluster) %>%
mutate(n.rel = Occurrences / sum(Occurrences) ) %>%
slice(1:3)